ParceLiNGAM: A causal ordering method robust 
against latent confounders 

Tatsuya Tashiro* Shohei Shimizu| Aapo Hyvarinen^and Takashi Washio§ 



Abstract 

We consider learning a causal ordering of variables in a linear non- Gaussian 
acyclic model called LiNGAM. Several existing methods have been shown to 
consistently estimate a causal ordering assuming that all the model assump- 
tions are correct. But, the estimation results could be distorted if some as- 
sumptions actually are violated. In this paper, we propose a new algorithm for 
learning causal orders that is robust against one typical violation of the model 
assumptions: latent confounders. The key idea is to detect latent confounders 
by testing independence between estimated external influences and find subsets 
(parcels) that include variables that are not affected by latent confounders. We 
demonstrate the effectiveness of our method using artificial data and simulated 
brain imaging data. 



1 Introduction 

Baycsian networks have be en wide l y used to analyze causal relations of variables 
in ma ny empirical sciences (|BolleiAll989tlPear]Ll2005lSpirtes. Glvmour. fc Schemes! . 
1993h . A common assumption is linear- Gaussianity. But this poses serious iden- 
tifiability problems so that many important models arc indist inguishable with no 



prior knowledge on the structures. Recently, it was shown by (jShimizu. Hover. Hvvarinen. fc Kerminen , 
20061 ) that the utilization of non-Gaussianity allows the full structure of a linear 
acyclic model to be identified without pre-specifying any causal orders of vari- 
ables. The new mode l, a Linear Non-Gaussian Acyclic Model called LiNGAM 
' 2006h. is closely related t o independent component analysis 



( Shimizuetal 



(ICA) (jHwarinen. Karhunen. fc Oial 20011). 

Most existing estimation methods (jShimizu et al. . 20061 2011 ; Hvvarinen fc Smith . 

l2013h for LiNGAM learn causal orders assuming that all the model assumptions 
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hold. Therefore, these algorithms could return completely wrong estimation re- 
sults when some of the model assumptions are violated. Thus, in this paper, we 
propose a new algorithm for learning causal orders that is robust against one 
typical model violation, i.e., latent confoundcrs. A latent confoundcr means a 
variable which is not observed but which exerts a causal influence on some of 
the observed variables. Many re al- world applications including brain imaging 



data analysis ([Smith et all 120111 ) could benefit from our approach. 

This papeiu is organized as follows. We first re view LiNGAM ( Shimizu et al 



2006 ) and its extension to latent confounder cases (jHover. Shimizu. Kerminen. fc Palviainen . 



20081 ) in Section [5] In Section [3J we propose a new algorithm to learn causal 
orders in LiNGAM with latent confounders. We empirically evaluate the per- 
formance of our algorithm using artificial data in Section [4] and simulated fMRI 
data in Section [5] We conclude this paper in Section [6] 



2 Background: LiNGAM with latent confounders 



We b riefly review a linear non-Gaussian acyclic model called LiNGAM ([Shimizu et al.l 
20061) and an extension of the LiNGAM to cases with latent confounding vari- 



ables dHover et all. 120081) 



(i 



In LiNGAM (jShimizu et al 
= !,-•• , d) are modeled as: 



2006), causal relations of observed variables 



E 

k(j)<k{i) 



(1) 



where k(i) is a causal ordering of the variables x,. In this ordering, the variables 
Xi graphically form a directed acyclic graph (DAG) so that no later variable 
determines, i.e., has a directed path on any earlier variable. The e» are external 
influences, and bij are connection strengths. In matrix form, the model (TIJ is 
written as 



x = Hx + e, 



(2) 



where the connection strength matrix B collects and the vectors x and e col- 
lect Xi and e,. Note that the matrix B can be permuted to be lower triangular 
with all zeros on the diagonal if simultaneous equal row and column permu- 
tations are made according to a causal ordering k(i) because of the acyclicity. 
The zero/non-zero pattern of bij corresponds to the absence/existence pattern 
of directed edges. External influences e, follow non-Gaussian continuous distri- 
butions with zero mean and non-zero variance and are mutually independent. 
The non-Gaussianity assu mption on e,; enables i dentification of a causal ordering 
k(i) based on data x only (jShimizu et all 120061 ) . This feature is a major advan- 
tage o ver conventional Ba yesian networks based on the Gaussianity assumption 
on a (jSpirtes et all Il993l) . 



1 S ome preliminary results were presented in (iTashiro, Shimizu, Hvvarinen. &: Washiol 
l2012t) , which corresponds to Section l3.1l of this paper. 
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Next, LiNGAM with latent confounders ( Hover et al. , 20081) can be formu- 
lated as follows: 



Bx + Af + e, 



(3) 



where the difference with LiNGAM in Eq. @ is the existence of latent confound- 
ing variable vector /. A latent confounding variable is a latent variable that is a 
parent of more than one observed variable. The vector / collects non-Gaussian 
latent confounders fj with zero mean and no n-zero variance (j = ,<?). 
Without loss of generality ( Hover et al. . 20081 ). latent confounders fj are as- 



sumed to be mutually independent. The matrix A collects Ay which denotes 
the connection strength from fj to x%. For each j, at least two Ay are non-zero 



since a latent confoundcr is defined to have at least two children (jHover et al 
2008). The matrix A is assumed to be of full column rank. 



The central problem of causal discovery based on the latent variable LiNGAM 
in Eq. ([3]) is to estimate as many of causal orders k(i) and connection strengths 
bij as possible based on data x only. This is because in many cases only an 
equivalence class of the true model whose members p roduce the exact same 



observed distribution is identifiable (jHover et all 1200 



In ( Hover et al., 200 8]), a n estimation method based on overcomplete ICA 
( Lewicki. fe Seinowskil . l200f)h was proposed. However, overcomplete ICA meth- 
ods a re often not very reliable and get stuck in local optima. Thus, in (jEntner fe Hover 
20 111) , a method that does not use overcomplete ICA was proposed to first find 
variable pairs that are not affected by latent confounders and then estimate a 
causal ordering of one to the other. However, their method does not estimate a 
causal ordering of more than two variables. A simple cumulant-based method 
for estimatin g the model in the ca se of Gaussian latent confounders was further 
proposed by (ichen fc Chanl 120131) . 



3 A method robust against latent confounders 

In this section, we propose a new approach for estimating causal orders of more 
than two variables without explicitly modeling latent confounders. 



3.1 Identification of causal orders of variables that are not 
affected by latent confounders 

We first provide principles to identify an exogenous (root) variable and a sink 
variable which are such that are not affected by latent confounders in the latent 
variable LiNGAM in Eq. ([3]) (if such variable s exist) and next pres ent an estima- 



tion algorithm. Recent estimation m ethod s (Shim izu et all 120111) for LiNGAM 



in Eg. (El) and its nonlinear extension (Hover. Janzing. Mooii. Peters, fc Scholkopj . 



20091 iMooii. Janzing. Peters, fc Scholkopj . l2009l) learn a causal ordering by find- 
ing causal orders one by one either from the top downward or from the bottom 
upward assuming no latent confounders. We extend these ideas to latent con- 
founder cases. 
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We first generalize Lemma 1 of (jShimizu et all 120111) for the case of latent 
confounders. 



Lemma 1 Assume that all the model assumptions of the latent variable LiNGAM 
in Eq. |3P are met and the sample size is infinite. Denote by rj the residuals 
when xi are regressed on x^: r\ = xi — (i 7^ ])■ Then a variable 

Xj is an exogenous variable in the sense that it has no parent observed variable 
nor latent confounder if and only if Xj is independent of its residuals r| for all 

Next, we generalize the idea of ( Mooij et al. . 20091 ) for the case of latent con- 
founders. 

Lemma 2 Assume that all the model assumptions of the latent variable LiNGAM 
in Eq. f3J) are met and the sample size is infinite. Denote by Xf-j) a vec- 
tor that contains all the variables other than Xj . Denote by ^ the resid- 
ual when Xj is regressed on £E(_j), i-e., J ' = Xj — &T—j)j^7—j\ x (— ?")> where 



E = 



T 

J 3{-J) 

r i(-i) 



is the covariance matrix of [xj, xT^] . Then a vari- 
able Xj is a sink variable in the sense that it has no child observed variable nor 
latent confounder if and only if is independent of its residual r - ^ . [j 

The proofs of these lemmas are given in the appendix^) 

Thus, we can take a hybrid estimation approach that uses these two princi- 
ples. We first identify an exogenous variable by finding a variable that is most 
independent of its residuals and remove the effect of the exogenous variable 
from the other variables by regressing it out. We repeat this until independence 
between every variable and all of its residuals is statistically rejected. Depen- 
dency between every variable and any of its residuals implies that an exogenous 
variable as defined in Lemma Q] does not exist or some model assumption of 
latent variable LiNGAM in Eq. ([3]) is violated. Similarly, we next identify a 
sink variable in the remaining variables by finding a variable such that its re- 
grcssors and its residual arc most independent and disregard the sink variable. 
We repeat this until independence is statistically rejected for every variable^ 
To test independence, we first evaluate pairwise independence between vari- 
ables and the residua ls using a kernel-based independence measure called HSIC 
( Gretton et al. . 20081 ) and then comb ine the result ing p- values pi (i = 1, • • ■ , c) 



using a well-known Fisher's method ((Fisher, Il950h to compute the test statis 



tic — 2~Yfi = \ logPi, which follows the chi-square distribution with 2c degrees of 
freedom when all the pairs are independent. 

Since all the causal orders are not necessarily identifiable in the latent vari- 



able LiNGAM in Eq. (jHover et all 120081) , we here aim to estimate a d x d 



We prove t he lemmas without assuming the faithfulness (ISpirtes et all [l993) unlike our 
previous work dTashiro et a.1.1. l2012fl . 

3 The issue of multiple comparisons arises in this context, which we would like to study in 
future work. 
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causal ordering matrix C=[cy] that collects causal orderings between two vari- 
ables, which is defined as 

-1 if k(i) < k(j) 
1 if k(i) > k(j) (4) 
if it is unknown whether either of the two cases 
above (—1 or 1) is true. 

Thus, the estimation consists of the following steps: 

Algorithm 1: Hybrid estimation of causal orders of variables that are not affected 
by latent confoundcrs 

INPUT: Data matrix X and a threshold a 

1. Given a d-dimcnsional random vector a;, a dxn data matrix of the random 
vector as X and a significance level a, define U as the set of variable indices 
of a;, i.e., {1, • ■ • , d} and initialize an ordered list of variables Khead '■= 
and Kta-a ■= and m := 1. K head and K taa denote the first \K head \ 
variable indices and the last |-Kt a i/| variable indices respectively, where 
each of \Khead\ and |ift a iz| denotes the number of elements in the list. 

2. Let x := x and X := X and find causal orders one by one from the top 
downward: 

(a) Do the following steps for all j 6 U \ Khead'- Perform least squares 
regressions of Xi on Xj for all i S U \ Khead (i ^ j) and compute 
the residual vectors f^' and the residual matrix R/ J "). Then, find a 
variable x rn that is most independent of its residuals: 

x m = arg max P Fish er(xj, f (j) ), (5) 

j£U\K hsad 

where PFisher{xj, f^') is the p- value of the test statistic defined as 
- 2 £\ log{P H (xj , f\ i] ) } , where P H ( Xj , rf ] ) is the p- value of the HSIC . 

(b) Go to Step [3] if Ppisheriim, f^™ 1 ') < a, i.e., all independencies arc 
rejected. 

(c) Append 77i to the end of K head and let x := r {m) and X := R (m ). If 
|-?Oiead| = d — 1, append the remaining variable index to the end of 
Khead and terminate. Otherwise, go back to Step (|2"a|) . 

3. If lifheadl < d — 2, let x' = x and X' = X and U' := U \ Khead and find 
causal orders one by one from the bottom upward @: 



4 We do not examine remaining two variables in this step since it is already implied in 
Step [2] that some latent confounders exist. If there were no latent confoundcrs between the 
remaining two, their causal orders would have already been estimated in Step [2] 
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(a) Do the following steps for all j £U'\ K ta a ■ Collect all the variables 
except x'j in a vector x',_-y Perform least squares regressions of x'j 

on x'r_j\ and compute the residual r'j . Then, find such a variable 
x' m that its regressors and its residual are most independent: 

x' m = arg max PFisher(x'(-j), '"',■ )■ (6) 

(b) Terminate if PFi S her{x'^_ m y r'^ m ') < a, i.e., all independencies are 
rejected. 

(c) Append m to the top of K ta a and let x' = jc'^.X' = X£_ >. 
Terminate 4 if |Z7' \ K ta a \ < 3 and otherwise go back to Step (|3a| . 

4. Estimate a causal ordering matrix C based on Khead and K ta u as follows. 
Estimate cy by -1, i.e., fc(z) < in either of the following cases: i) i is 
earlier than j in Khead', h) * is earlier than j in K ta u] hi) z is in Khead and 
j is in iTfaif; iv) i is in K hea d and j is neither K hea d nor AT tM z. Estimate 
cy by 1, i.e., k(i) > k(j) in cither of the following cases: i) i is later than 
j in Khead] ii) « is later than j in A' tal/ ; iii) i is in K tai i and j is in K he ad] 
iv) i is in K ta u and j is neither Khead nor K ta u. Estimate cy by 0, i.e., 
the ordering is unknown if i and j arc neither in K ta a nor Khead- Note 
that causal orders of variables that are not in Khead or K ta u are no later 
than any in K ta u and no earlier than any in Khead- 

OUTPUT: Ordered lists Khead and K ta a and a causal ordering matrix C 



3.2 A new estimation algorithm robust against latent con- 
founders 

Algorithm 1 outputs no causal orders in cases where exogenous variables and 
sink variables as in Lemmas [1] and [2] do not exist. For example, in the left of 
Fig. [TJ there is no such exogenous variable or sink variable that is not affected 
by any latent confounder since the latent confounder j\ affects the exogenous 
variable x\ and the sink variable x^. Therefore, Algorithm 1 would not find 
any causal orders. However, if we omit X4 as in the right of Fig. [TJ and apply 
Algorithm 1 on the remaining x\, x%, X3 only, it will find all the causal orders of 
X\,X2,X3 since fi does not affect any two of x±,X2, X3 and is no longer a latent 
confounder. The same idea applies to the case that x\ is omitted. 

Thus, we propose applying Algorithm 1 on every subset of variables with the 
size larger than one. This enables learning more causal orders than analyzing 
the whole set of variables if a subset of variables has exogenous variables or sink 
variables that are not affected by latent confoundcrs. In practice, Algorithm 1 
could give inconsistent causal orderings between a pair of variables for different 
subsets of variables because of estimation errors. To manage possible incon- 
sistencies in the many causal orderings thus estimated, we rank the obtained 



G 




Figure 1: Left: An example graph where Algorithm 1 finds no causal orders. 
The /i is a latent confounder that affects X\ and X4. Right: Algorithm 1 finds 
the causal orders of x\, X2 and x$ if X4 is omitted and only x%, X2 and X3 are 
analyzed. 



causal ordering matrices by plausibility based on the statistical significances 
(this will be defined below). Then, considering any pair of two variables, we use 
the causal ordering given by the causal ordering matrix which has the highest 
plausibility and does contain an estimated causal ordering (i.e., the ordering 
was not considered unknown) between those two variables. 

We evaluate the plausibility of every causal ordering matrix by the p-value 
of the test statistic created based on Fisher's method combining all the p- values 
computed to estimate the causal orders K^ead and K ta a in Algorithm 1. A 
higher p-value can be considered to be more plausible. The test statistic is 
computed based on X, K^ead and K ta a as follows: 

-2( £ £ log{P H (x m M m) )}+ E E log^^/t"")}). (7) 

TiG-R"h eo d i:k(i)>k(m) m£K tai i i:k(i)<k(m) 

where PH(x m) r^) and Pff(a^, r are the p- values computed to estimate 
ordered lists Khead and K ta u in Algorithm 1. 

Thus, the estimation consists of the following steps: 



Algorithm 2: Applying Algorithm 1 on every subset of variables and merging 
results 



INPUT: Data matrix X and a threshold a 

1. Take all the /-combinations of variable indices {1, ■ • • , d} for I = 2, ■ ■ ■ , d. 
Denote the subsets of variable indices by t (s = 1, ■ • • ,5) and the 
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corresponding data matrices by X^ &set (s = 1, • • • ,S), where S is the 
number of the subsets. 

( s) ( s) 

2. Apply Algorithm 1 on X^ feset using the threshold a to estimate K^ ad , 
^taii an d f° r all s E {1, • • ■ , S}, where K^ ad and K^ u are ordered 
lists of Subset U^\ set and is a causal ordering matrix of Subset 

tj(s) 
subset ' 

3. Compute the p- value of the test statistic in Eq. ([7]) to evaluate the plau- 
sibility of CW for aU s € {1, • • • , S}. 

4. Estimate every element cy (i ^ j) of a causal ordering matrix C by the 
causal ordering between Xi and xj of the causal ordering matrix that has 
the highest plausibility and does contain an estimated causal ordering 
between Xi and Xj, that is, k(i) < k(J) or k(j) < k(i). 

OUTPUT: A causal ordering matrix C 



Algorithm 2 is a brute force approach since it applies Algorithm 1 on ev- 
ery subset (parcel) of variables. We could alleviate the computational load by 
first applying Algorithm 1 on the whole set of variables and then applying Algo- 
rithm 2 on the remaining variables whose causal orders have not been estimated 
after the effects of estimated exogenous variables are removed by regression. 
Thus, we finally propose the following algorithm called ParccLiNGAM: 



Algorithm 3: The ParceLiNGAM algorithm 



INPUT: Data matrix X and a threshold a 

1. Given a d-dimensional random vector x and a d x n data matrix of the 
random vector as X, define U as the set of variable indices of x, i.e., 
{1, • • ■ , d}. initialize a d x d causal ordering matrix C by the zero matrix. 

2. Apply Algorithm 1 on X using the threshold a to estimate Khead and 
K t aii and update C. 

3. Let U res :— U \ (Khead U Ktail)- Denote by C res the corresponding causal 
ordering matrix. Denote by |[/ res | the number of elements in U res . Go to 
Step ©if \U res \ < 2. 

4. Collect variables Xj with j € U res in a vector x res . Collect variables 
Xj with j G Kh ea( i in a vector aj^ead- Perform least squares regressions of 
x head on the i-th element of x res for all i £ U res and collect the residuals in 
the residual matrix R res whose z-th row is given by the residuals regressed 
on Xj. 

5. Apply Algorithm 2 on R res using the threshold a to estimate C res . Re- 
place every (i ^ j) of C by the corresponding element of C res if cy is 
zero and the corresponding element of C res is 1 or -1. 



6. Estimate connection strengths bij if all the non-descendants of 

timated, i.e., the i-th row of C has no zero. This can be done by doing 
multiple regression of Xi on all of its non-descendants Xj with k(j) < k(i). 

OUTPUT: A causal ordering matrix C and a set of estimated connection 
strength bij. 



In cases of no la tent confounders, Alg orithm 3 is essentially equivalent to 



DirectLiNGAM (jShimizu et al.l . l2011l) . Matlab codes for performing Algorithm 3 



are available at http : //www. ar . sanken. osaka-u. ac . jp/~sshimizu/ code/Plingamcode .html 



4 Experiments on artificial data 

We compared our me thod with two estima tion methods for LiN GAM in Eq. J2 1 



called ICA-LiNGAM (jShimizu et al.l . l2006l ) and DirectLiNGAM (jShimizu et al 



20111 ) that do not allow latent confounders and an estimatio n method for la- 



tent v ariable LiNGAM in Eq. (0) called Pairwise LvLiNGAM (jEntner fc Hover . 
l201l[) . If there are no latent confounders, all the methods should estimate cor- 
rect causal orders for large enough sample sizes. The numbers of variables were 
5, 10, and 15, and the sample sizes tested were 500, 1000, and 1500. The orig- 
inal networks used were shown in Fig. [2] to Fig. 01 The ei, e^, e^, eio, ei3, fi 
and fi followed a multimodal asymmetric mixture of two Gaussians, e2, es, es, 
en, ei4, /2 and f§ followed a double exponential distribution, and e^, eg, eg, 
ei2, ei5, /3 and fe followed a multimodal symmetric mixture of two Gaussians. 
The variances of the were set so that var(ei)/var(xi)=l/2. We permuted the 
variables according to a random ordering. The number of trials was 100. The 
significance level a was 0.05. 

First, to evaluate performance of estimating causal orders k{i), we computed 
the percentage of correctly estimated causal orders among estimated causal or- 
ders between two variables (Precision) and the percentage of correctly estimated 
causal orders among actual causal orders between two variables (Recall). We 
also computed the F-measure defined as 2 x Precision x Recall/ (Precision + 
Recall) , which is the harmonic mean of Precision and Recall. The reason why 
only pairwise causal orders were evaluated was that Pairwise LvLiNGAM only 
estimates causal orders of two variables unlike our method and DirectLiNGAM. 
Tables Q] [2] and [3] show the results. Regarding recalls and F-measures, the 
maximal performances when no statistical errors occur are also shown in the 
right-most columns. For example in Fig. [21 Pairwise LvLiNGAM can find all the 
causal orderings except fc(2) < fc(4), fc(2) < fc(5), fc(3) < fc(4) and k(3) < fc(5). 
ParccLiNGAM further can find k(2) < k(5) and k(3) < fc(5) since it estimates 
causal orderings between more than two variables. In some cases, the empirical 
recalls and F-measures were higher than their maximal performances. This is 
because causal orders of some variables that are affected by latent confounders 
happened to be correctly estimated. Regarding precisions and F-measures, our 
method ParceLiNGAM worked best for all the conditions. Regarding recalls, 
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ParceLiNGAM worked best for most conditions and was the second-best but 
comparable to the best method DirectLiNGAM for the other conditions. 

Next, to evaluate the performance in estimating connection strengths fry, we 
computed the root mean square errors between true connection strengths and 
estimated ones. Note that Pairwise LvLiNGAM does not estimate bij. Tabled] 
show the results. Our method was most accurate for all the conditions. 

Table [5] shows average computation times. The amount out computation 
of our ParceLiNGAM was larger than the other methods when the sample size 
was increased. However, its amount of computation can be considered to be 
still tractable. For larger numbers of variables, we would need to select a subset 
of variables to decrease the number of variables to be analyzed. However, this 
selection does not bias results of our method since it allows latent confounders. 



Table 1: Precisions 









Sample size 








500 


1000 


1500 


ParceLiNGAM 


dim 


=5 


1.0 


1.0 


1.0 




dim 


= 10 


0.81 


0.88 


0.93 




dim 


= 15 


0.81 


0.89 


0.92 


PairwiscLvLiNGAM 


dim 


=5 


0.87 


0.94 


0.94 




dim 


= 10 


0.75 


0.79 


0.81 




dim 


= 15 


0.67 


0.76 


0.75 


DirectLiNGAM 


dim 


=5 


0.82 


0.88 


0.85 




dim 


= 10 


0.59 


0.71 


0.73 




dim 


= 15 


0.78 


0.80 


0.82 


ICA-LiNGAM 


dim 


=5 


0.80 


0.75 


0.76 




dim 


= 10 


0.62 


0.62 


0.58 




dim 


= 15 


0.58 


0.59 


0.58 



Table 2: Recalls 









Sample size 


Max. performance 








500 


1000 


1500 




ParceLiNGAM 


dim 


=5 


0.86 


0.82 


0.80 


0.80(8/10) 




dim 


= 10 


0.79 


0.85 


0.91 


0.91(41/45) 




dim 


= 15 


0.80 


0.87 


0.89 


0.94(99/105) 


PairwiscLvLiNGAM 


dim 


=5 


0.65 


0.62 


0.59 


0.60(6/10) 




dim 


= 10 


0.50 


0.55 


0.54 


0.49(22/45) 




dim 


= 15 


0.39 


0.45 


0.43 


0.46(48/105) 


DirectLiNGAM 


dim 


=5 


0.82 


0.88 


0.85 






dim 


= 10 


0.59 


0.71 


0.73 






dim 


= 15 


0.78 


0.80 


0.82 




ICA-LiNGAM 


dim 


=5 


0.80 


0.75 


0.76 






dim 


= 10 


0.62 


0.62 


0.58 






dim 


= 15 


0.58 


0.59 


0.58 





5 Experiments on simulated fMRI data 



Finally, we tested our method on simulated funct ional magnetic resonance imag- 



ing (fMRI) data generated in ([Smith et all 120111) based on a well-known mathe 



matical brain model called the dynamic causal modeling (jFriston. Harrison, fc Penny 
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Tabic 3: F-measures 









Sample size 


Max. performance 








500 


1000 


1500 




ParccLiNGAM 


dim 


=5 


0.92 


0.90 


0.89 


0.89 




dim 


= 10 


0.80 


0.86 


0.92 


0.95 




dim 


= 15 


0.81 


0.88 


0.90 


0.97 


PairwiscLvLiNGAM 


dim 


=5 


0.75 


0.75 


0.72 


0.75 




dim 


= 10 


0.60 


0.65 


0.65 


0.66 




dim 


= 15 


0.49 


0.56 


0.54 


0.63 


DircctLiNGAM 


dim 


=5 


0.82 


0.88 


0.85 






dim 


= 10 


0.59 


0.71 


0.73 






dim 


= 15 


0.78 


0.80 


0.82 




ICA-LiNGAM 


dim 


=5 


0.80 


0.75 


0.76 






dim 


= 10 


0.62 


0.62 


0.58 






dim 


= 15 


0.58 


0.59 


0.58 





Table 4: Root Mean Square Errors 









Sample size 








500 


1000 


1500 


ParccLiNGAM 


dim.- 


=5 


0.030 


0.020 


0.016 




dim.- 


= 10 


0.078 


0.060 


0.052 




dim.- 


= 15 


0.083 


0.046 


0.031 


DircctLiNGAM 


dim. 


=5 


0.22 


0.16 


0.18 




dim.- 


= 10 


0.16 


0.083 


0.089 




dim.; 


= 15 


0.096 


0.074 


0.070 


ICA-LiNGAM 


dim. 


=5 


0.11 


0.11 


0.10 




dim.: 


= 10 


0.16 


0.15 


0.15 




dim.: 


= 15 


0.16 


0.14 


0.13 



Table 5: Computational Times 











Sample size 










500 


1000 


1500 


ParccLiNGAM 


dim 


=5 


0.66 sec. 


1.7 sec. 


4.4 sec. 




dim 


= 10 


10 sec. 


1.5 min. 


8.1 min. 




dim 


= 15 


8.5 min. 


5.3 hrs. 


19 hrs. 


PairwiscLvLiNGAM 


dim 


=5 


0.64 sec. 


2.6 sec. 


7.0 sec. 




dim 


= 10 


2.8 sec. 


12 sec. 


30 sec. 




dim 


= 15 


6.6 sec. 


29 sec. 


74 sec. 


DircctLiNGAM 


dim 


=5 


0.23 sec. 


0.84 sec. 


1.2 sec. 




dim 


= 10 


1.7 sec. 


7.3 sec. 


11 sec. 




dim 


= 15 


6.4 sec. 


29 sec. 


44 sec. 


ICA-LiNGAM 


dim 


=5 


0.12 sec. 


0.051 sec. 


0.047 sec. 




dim 


= 10 


0.34 sec. 


0.18 sec. 


0.10 sec. 




dim 


= 15 


0.81 sec. 


0.68 sec. 


0.53 sec. 
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2003th We used Simulation 2 data and Simulation 6 data. Both datasets con- 



sisted of 10 variables whose causal structure is shown in Fig. El The session 
durations were 10 minutes (200 time points) and 60 minutes (1200 time points), 
respectively. We also created a dataset of 30 minutes (600 time points) by taking 
the first half of Simulation 6 data. Although these data are time-series, we did 
not add lag-b ased approaches including v ector autoregressive mod els into com 
parison as m (|Hvvarinen fc Smith! . 120131) since it was shown by ([Smith et al 



20111) that lag-based methods worked poorly on these Simulation 2 data and 



Simulation 6 data. 

For each of the three different duration settings, we gave the 50 datasets 
(one by one) to ParceLiNGAM, PairwiseLvLiNGAM, DirectLiNGAM and ICA- 
LiNGAM after omitting x\ to create a latent confounder and randomly permut- 
ing the other variables. Table [5] shows the precision, recalls, and F-measures of 
causal orders. Regarding precisions, we excluded such variable pairs Xi and Xj 
that one has no directed path to the other, e.g., x-i and xg, since both k(i) < k(j) 
and k{i) > k(j) are correct. This was because estimation of causal directions is 
the main topic of this paper. The significance level a was 0.05. For all of the 
cases, ParceLiNGAM worked better than the others. 



6 Conclusions 

We proposed a new algorithm for learning causal orders, which is robust against 
latent confounders. In experiments on artificial data and simulated fMRI data, 
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Table 6: Results on simulated fMRI data 





sim2 (10 min.) 


sim6 (30 min.) 


sim6 (60 min.) 


ParccLiNGAM 


Precision 


0.54 


0.56 


0.60 




Recall 


0.53 


0.55 


0.58 




F-mcasure 


0.53 


0.55 


0.59 


PairwiscLvLiNGAM 


Precision 


0.31 


0.25 


0.24 




Recall 


0.22 


0.15 


0.14 




F-mcasure 


0.26 


0.19 


0.18 


DiroctLiNGAM 


Precision 


0.50 


0.51 


0.45 




Recall 


0.50 


0.51 


0.45 




F-mcasure 


0.50 


0.51 


0.45 


ICA-LiNGAM 


Precision 


0.49 


0.47 


0.47 




Recall 


0.49 


0.47 


0.47 




F-mcasure 


0.49 


0.47 


0.47 



our methods learned more causal orders correctly than existing methods. An 
important problem for future research is to develop computationally more effi- 
cient algorithms. One approach might be to develop a divide-and-conquer al- 
gorithm that divides variables into subsets with moderate numbers of variables 
and integrates the estimation results on the subsets. 
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Appendix 

We first give Darmois-Skitovitch theorem ( Darmoisl . 1 19531 : ISkitovitcb . 19531 ): 

Theorem 1 (Darmois-Skitovitch theorem (D-S theorem)) Define two ran- 
dom variables y\ and yi as linear combinations of independent random variables 
Si(i=l, ■■■ , q): j/i = Yh=i a i s *> V2 = Yh=i A s *- Then, if y x and y 2 are inde- 
pendent, all variables Sj for which ctjftj ^ are Gaussian, rj 

In other words, this theorem means that if there exists a non-Gaussian Sj for 
which Oj/Sj^O, j/i and 2/2 are dependent. 
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Proof of Lemma [TJ 

i) Assume that Xj has at least one parent observed variable or latent con- 
founder. Let Pj denote the set of the parent variables of Xj . Then one can write 
Xj=^2 Ph< zp. WjhPh+ej, where the parent variables ph are independent of ej and 
the coefficients Wjh are non-zero. Let a vector xp. and a column vector wp. 
collect all the variables in Pj and the corresponding connection strengths, respec- 
tively. Then, the covariances between xp j andxj are EfapjXj) = B{xp J .(iOp,xp j + 
ej)} = £J(xp.Xp.)i«p.. The covariance matrix ^(xp^Xp.) is positive definite 
since the external influences and latent confounders are mutually independent 
and have positive variances. Thus, the covariance vector E(x.p j x j) = E(xp j xp'.)wp j 
above cannot equal the zero vector, and there must be at least one variable in 
Pj with which Xj covaries. 

i-a) Suppose that X{ is a parent of Xj in Pj that covaries with Xj. For such 
Xi, we have 



(j) cov(xi,Xj) 
r\ J ' = Xi — — f-x 



j 



(8) 



COv(xi, Xj) . v-^ \ , nS 

~^f( E "ihPh + ei) (9) 



cov(xi , Xj ) 1 cov(a;i , Xj 



var(xj) J var(xj) 

cov(xi , Xj ) 
vav(xj) Jr 



(10) 



Each of those parent variables (including x{) in Pj is a linear combination of 

external influences other than ej and latent confounders that are non-Gaussian 

(j) 

and independent. Thus, the r\ and Xj can be written as linear combinations 
of non-Gaussian and independent external influences including ej and latent 
confounders. Further, the coefficient of ej on r. is non-zero since cov(cCj;, Xj) ^ 
aforementioned and that on Xj is one by definition. These imply that r\ and 

( 7 1 

Xj are dependent since r\ , Xj and ej correspond to y±, j/2, Sj in D-S theorem, 
respectively. 

i-b) Next, suppose that Xj has a latent confounder fk in Pj that covaries with 
Xj . The latent confounder fk should have a non-zero coefficient on at least one 
other observed variable Xi . Without loss of generality, it is enough to consider 
two observed variable cases that we only observe Xi and Xj : 

Xi = bijXj + \ikfk + ei + E ^ihfh (11) 

h^k 

Xj = bjjXj + Ajfc/fc + e i + ^ \lfl, (12) 

where Xik and Xjk are non-zero since fk is a latent confounder of Xi and Xj. 
Since the model is acyclic, bijbji = 0. 
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First, suppose that b^ is zero. Then, we have 



.0') 



cov(xi, Xj) 

Xi - - Xj 

va,v{xj) 

cov(xj,Xj) lu 
\\k ZZZ7Z \ — V°ji A ifc 



-{1 



var(x i ) 
cov(a;i,Xj) 



y ^jk)}fk 
COv(Xi,Xj) 



v&r(xj) JZ ' v&r(xj) 



Di, 



(13) 



(14) 



where D\ is a linear combinations of non-Gaussian and independent latent con- 
founders other than If cov(xi,xj) is zero, the coefficient of on is 



Xik and is non-zero. If cav(xi,Xj) is non-zero, the coefficient of ej on is 
— ° °^^ x ^ and is non-zero. Thus, in both of the cases, rf and xj are depen- 
dent due to D-S theorem. Remember that the coefficient of ej on Xj is one by 
definition. 

Next, suppose that bji is zero. Then, we have 



var(xj) 

= {(hijXjk + Xik) /' / Xjk}fk 

V&Y(Xj) 

cav(xi, Xj ) 
+ei + (by varfo) )&3 



Do 



(15) 



(16) 



where D2 is a linear combinations of non-Gaussian and independent latent con- 
founders other than If cov(xj,Xj) is zero and bij is zero, the coefficient of 
ff. on is Xik and is non-zero. If cov(xi, Xj) is zero and b%j is non-zero, the 
coefficient of ej on r\ is 6^ and is non-zero. If cov(xi,Xj) is non-zero and bij 

and is non-zero. If cov(xi,Xj) 



is zero, the coefficient of e 3 on is 



covf^ij^j) 
r(x 3 ) 



is non-zero and bij is non-zero, either of the followings holds: a) the coefficient 
of ej on rj is non-zero, that is, bij ^ cov(xi, Xj)/var(xj) or b) the coefficient 
of ej on r\ is zero and hence the coefficient of fk on r\ is Xik and is non-zero. 
Thus, in all of the cases, r\ and x j fire dependent due to D-S theorem. 

ii) The converse of contrapositive of i) is straightforward using the model 
definition. From i) and ii), the lemma is proven. ■ 

Proof of Lemma [2] 

i) Assume that a variable Xj has at least one child observed variable or latent 
confoundcr. First, without loss of generality, one can write 



x 



(-3) 



(I-B)- 1 (A/ + e) =A(A/ + e) 



1 

a (-3)3 



a j{-i) 
A (-3) 



A (-3)/ 



^3 

e (-3) 



(17) 
(18) 
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where each of A (= (I — B) _1 ) and A-(-j) is invertible and can be permuted to 
be a lower triangular matrix with the diagonal elements being ones if the rows 
and columns are simultaneously permuted according to the causal ordering k(i). 
The same applies to the inverse of A: 



-D- 1 ' 



(-j)j 



-a 1 , ,D' 

o(-o) 

D 1 



(19) 



where D 
Then, 



H-j) ~ a (.-j)j a j(-j) 



Thus, 1 - a 



ff C-i)i S (-i) !B ( 



-j) 



In Eq.(22 



r (-iW Zj (-i)t"(-ib' 
a J(-j) A (-i) 



e (-j)) 

e j) + A (-i)( A (-j)/' 



S (-,-)( a (-JW A i +A 



-3)3' 



i-3) 



(20) 
(21) 

A (-,))}/ 



[X] 

T , then we have 



if a T , > 

3(-3) 



°"(-j)j S (-j) A (-J) 



T 



A (-i) a (-i)j} e j( 23 ) 
(24) 



Thus, the coefficient of ej on rj 3 '' is one. Now, suppose that Xj has a child 

( — i) 

Xi. If the coefficient of ej on Xi is non-zero, rj and a^-j) are dependent 
due to D-S theorem. Even if it is zero, i.e., cancelled out to be zero by special 
parameter values of the connection strengths, the coefficient of ej on at least 



one other variable in x 



(-J) 



is non-zero since there must be such an observed 



variable to cancel out the coefficient of ej on xi to be zero. It implies that 
Tj' and are dependent due to D-S theorem. Next, suppose that Xj has 

a latent confounder /j. Then, in Eq. (|2"4")) . the corresponding element in Xj is 
not zero, i.e., the coefficient of fi on r) is not zero. Further, /j has a non- 



zero coefficient on at least one variable in x< 



(—1) 

confounders. Therefore, rj- and 

On the other hand, in Eq. ([22|) . if aj,. 



-./) 



due to the definition of latent 



are dependent due to D-S theorem. 
- - T ^7^. ) A(_j) ^ T , at least 

C-i) 



one of the coefficients of the elements in &{-j) on J; is not zero. By definition, 
every element in ei_j\ has a non-zero coefficient on the corresponding element 

in Thus, ; and X(_j) are dependent due to D-S theorem. 

ii) The converse of contrapositive of i) is straightforward using the model 
definition. From i) and ii), the lemma is proven. ■ 
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